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1.  Introduction. 


Tke  first  half  of  the  title  of  this  paper  is  borrowed  from  the  heading  “Nearby  variables 
with  nearby  laws,”  used  by  Dudley  [4,  p.  318]  in  his  book  to  summarize  the  Strassen- 
Dudley  theorem:  If  F  and  G  are  distributions  on  a  Polish  space  which  are  close  in  the 
Prohorov  metric,  then  these  distributions  can  be  realized  on  some  probability  space  by 
random  variables  X  and  Y  with  laws  C(X)  =  F  and  £{Y)  =  G  such  that  X  and  Y  are 
close  in  probability. 

Combining  this  theorem  with  Lemma  2.2.2  below,  we  can  restate  it  in  the  following 
form:  Let  X  be  a  random  variable,  defined  on  a  rich  enough  probability  space  (ft,  S,  P), 
and  with  values  in  a  Polish  space  B.  Let  G  be  a  law  on  B  which  is  close  to  the  law  C(X) 
of  X  in  the  Prohorov  metric.  Then  there  exists  a  random  variable  Y  defined  on  (ft,  £,  P) 
with  law  C(Y)  =  G,  and  such  that  Y  is  close  to  X  in  probability. 

It  is  this  form  of  the  Strassen-Dudley  theorem  which  is  most  effective  in  proving 
strong  approximation  theorems.  It  will  eliminate  the  need  to  use  such  well-known  but 
somewhat  suspicious  looking  phrases  as:  “Without  changing  its  distribution  we  can  redefine 
the  sequence  of  random  variables  on  a  new  probability  space  on  which  there  exists  a 
Brownian  motion  . . . ,”  or  “Without  loss  of  generality  (in  the  sense  of  Strassen)  there 
exists  a  Brownian  motion,”  etc.  In  other  words,  in  these  strong  approximation  theorems 
we  will  be  able  to  keep  the  given  random,  variables  and  probability  space  and  we  will 
construct  the  approximating  sequence  on  the  same  probability  space. 

There  is  a  natural  generalization  of  the  Strassen-Dudley  theorem  to  regular  conditional 
distributions.  Let  X ,  B  and  (ft,  5,  P)  be  as  before  and  let  Q  be  a  countably  generated  sub- 
(r-field  of  S.  Let  G( •  \  Q)  be  a  regular  conditional  distribution  on  £,  defined  on  (ft,  5,  P) 
and  measurable  with  respect  to  Q.  Suppose  that  with  high  probability  the  conditional  law 
C(X  \Q)oiX  given  Q  is  close  in  the  Prohorov  metric  to  G(-  |  Q).  Then  there  is  a  random 
variable  K,  defined  on  (ft,S,P),  with  conditional  law  C{Y  |  Q)  =  G{ •  |  G)  a.s.,  and  such 
that  Y  is  close  to  X  in  probability. 

However,  as  it  happens,  conditional  versions  of  the  Strassen-Dudley  theorem,  are 
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much  more  useful  if  they  include  assertions  about  independence:  Let  F,  Q  and  H  be  sub- 
<r-fields  of  S  with  Q  and  %  being  countably  generated  and  QV'HC.F.  Suppose  that  with 
high  probability  the  conditional  law  C{X  |  F)  is  close  to  G(-  |  G),  a  regular  conditional 
distribution  on  B.  Then  there  is  a  random  variable  Y,  defined  on  (fl,«S,P)  which  is 
independent  of  7i  given  Q,  has  conditional  law  (?(•  |  Q)  given  G,  and  is  close  to  X  in 
probability. 

For  75^- valued  random  variables  all  these  results  can  be  rephrased  in  terms  of  charac¬ 
teristic  functions:  If  g  is  a  characteristic  function  on  7id  which  is  close  to  the  characteristic 
function  of  X,  then  [2,  Lemma  2.2],  combined  with  the  above  version  of  the  Strassen- 
Dudley  theorem,  yields  a  random  variable  Y ,  defined  on  (£1,<S,  P)  which  is  close  to  X 
in  probability,  and  has  characteristic  function  g.  A  conditional  version  of  this  result  has 
been  known  to  the  'workers  in  this  area  for  a  long  time.  For  it  was  recognized  that  the 
proof  of  [2,  Theorem  1]  still  works  if  there  gk  is  replaced  by  a  conditional  characteristic 
function  gt(-  |  Gk- 1)  where  {Gk-,  k  >  1}  is  a  sequence  of  countably  generated  cr-fields  with 
Gk  C  Pjfc.  However,  since  there  were  no  interesting  applications  apparent,  this  seemed  a 
rather-  useless  generalization.  As  a  matter  of  fact,  in  light  of  Remark  2.6  below,  more  often 
than  not  it  is. 

The  purpose  of  this  paper  is  fourfold.  First,  we  shall  recast  the  conditional  versions 
of  the  above  mentioned  theorems  in  a  form  which  makes  them  readily  applicable  and, 
moreover,  which  contains  most  of  the  known  approximation  theorems.  Second,  we  shall 
discuss  in  some  detail  to  what  extent  these  results  can  be  generalized.  For  example,  we 
will  give  a  negative  answer  to  the  following  question.  If,  in  the  above  notation,  with  high 
probability,  C{X  |  F)  is  close  to  <?(•  |  G)  in  the  Prohorov  metric  for  some  sub-<7-fields 
F  and  G,  is  it  always  possible  to  construct  a  random  variable  Y  with  conditional  law 
C{Y  |  G)  =  G(-  |  G)  which  is  close  to  X  in  probability?  In  our  counterexample  even 
F  C  G  is  satisfied  (Remark  2.3).  Third,  the  utility  of  our  results  will  be  demonstrated 
in  a  proof  of  a  new  strong  approximation  theorem  for  Hilbert  space  valued  martingales. 
When  properly  normalized  these  converge  in  law  to  a  mixture  of  Gaussian  distributions. 
Finally,  we  present  counterexamples  to  several  reasonably  sounding  conjectures  on  the 
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strong  approximation  of  martingales.  We  believe  that  these  together  with  our  Theorem  7 
bring  the  subject  to  a  certain  close. 

The  first  strong  approximation  theorem  for  martingales  can  be  found  in  Strassen’s  fun¬ 
damental  paper  [12].  Let  {x„,£„,n  >  1}  be  a  real-valued  martingale  difference  sequence 
with  finite  second  moments.  Suppose  Vn  :=  2fc<n  E{x\  \  £k-i)  — ►  oo  a.s.  and. that  {xn} 
satisfies  a  kind  of  Lindeberg  condition.  Using  the  Skorohod  embedding  theorem  Strassen 
[12]  proved  that  if  the  underlying  probability  space  is  rich  enough  then  the  martingale  can 
be  approximated  with  probability  one  by  a  standard  Brownian  motion  scaled  according  to 
the  conditional  variances  of  the  given  martingale  sequence,  i.e. 

(1.1)  ^>2xnl{Vn  <t}  —  B(t)  =  o(t%)  a.s. 

n>l 

The  utility  of  strong  (or  almost  sure)  invariance  principles,  as  they  are  called,  is  clear. 
If  the  error  term  in  this  approximation  is  small  enough  then  many  of  the  properties  of 
standard  Brownian  motion  are  shared  by  the  given  martingale  sequence.  For  instance, 

(1.1)  implies  the  functional  versions  of  the  CLT  and  the  LIL,  but  for  the  upper  and  lower 
class  integral  test  for  the  LIL  an  error  term  0  ((tf  log  logt)^  is  needed. 

Strassen’s  theorem  was  extended  in  [9]  to  Hilbert  space  valued  martingales  satisfying 
a  conditional  Lindeberg  condition  slightly  stronger  than  Strassen’s.  For  simplicity  consider 
an  7^- valued  martingale  difference  sequence  {x*,£fc,&  >1}  with  conditional  covariance 
matrices  <7*  =  E{xkxJ  |  Set 

An  =  ^  V»  =  trac e(An)  =  £{|x*|2  | 

k<n  k<n 

In  [9]  (for  an  improvement  see  [11])  it  is  shown  that  if,  in  addition  to  the  Lindeberg 
condition, 

(1.2)  £  -  A, 

where  A  is  a  non-random  positive  semidefinite  matrix,  then  (1.1)  continues  to  hold.  (For 
the  precise  statement  of  condition  (1.2),  see  (3.1.2)  below.)  But  here,  in  contrast  to  (1.1), 


B(t)  is  an  ft**- valued  Brownian  motion  with  mean  zero  and  covariance  matrix  A.  Of  course, 
if  d  =  1  then  (1.2)  is  automatically  satisfied  with  A  —  1.  In  [9]  an  example  was  presented 
to  show  that  for  d  >  1  hypothesis  (1.2)  cannot  be  dropped  if  (1.1)  is  to  hold. 

Still  assuming  d>  1  and  (1.2)  we  can  rewrite  the  d-dimensional  version  of  (1.1)  in  the 

form 


(1.3) 


53  x„l{V„  <  t}  -  yn 

n>l  n<t 


a.s. 


where  {yn,n  >  1}  is  a  sequence  of  i.i.d.  standard  Gaussian  ^-valued  random  variables. 
In  Theorem  7,  Section  3  below,  (1.3)  is  established  under  hypothesis  (1.2),  but  weakened 
to  allow  A  to  be  a  random  covariance  matrix,  measurable  with  respect  to  some  £*,  k  >  1. 
In  other  words,  we  shall  construct  a  sequence  {yn,  n  >  1}  of  i.i.d.  standard  Gaussian  7ld- 
valued  random  variables,  independent  of  A,  such  that  (1.3)  holds.  On  the  other  hand,  as 
we  show  by  example  in  Section  3.3,  without  the  assumption  that  A  be  £*-measurable  for 
some  finite  k  (1.3)  need  not  hold. 

The  more  general  version  of  (1.3)  with  random  A  is  still  useful,  because  it  shows,  for 
instance,  that  the  martingale  normalized  by  t~%  converges  in  law  to  a  mixture  of  Gaussian 
distributions.  But  it  also  implies,  via  a  Fubini  argument,  the  laws  of  the  iterated  logarithm 
and  their  upper  and  lower  class  refinements. 

As  to  the  methodology  we  indicated  above  that  Theorem  3  below  is  the  basis  for  our 
method.  However,  one  might  ask  in  this  context  whether  or  not  other  established  methods, 
such  as  the  Skorohod  embedding  theorem,  or  rather  a  vector-valued  version  of  it,  could 
possibly  be  used,  instead  of  Theorem  3,  to  prove  strong  approximation  theorems  for  vector¬ 
valued  martingales.  In  [8]  we  argued,  no  doubt  very  persuasively,  that  the  canonical  process 
to  embed  a  general  7^-valued  martingale  in,  must  be  an  ft^-valued  Gaussian  process  X 
indexed  by  C  €  C,  the  class  of  positive  semidefinite  d  x  d  matrices,  with  the  following 
properties: 

(i)  Ar(C)  is  Gaussian  with  mean  zero  and  covariance  matrix  C  for  each  C  6  C, 

(ii)  X  has  independent  increments,  i.e.,  the  vectors  X(C\),X(Ci  +  C2)  -  -Y(Cj), . . .  , 
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X(C\  H - 1-  Cn-i  +  Cn)  —  X(Ci  H - 1-  C„_i)  are  independent  for  all  n  >  1  for  all 

Ci,...  , Cn  6  C. 

After  building  a  strong  case  in  support  of  this  process  we  showed  that  for  d  >  1  it 
does  not  exist  [8]. 
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2.  Nearby  variables  with  nearby  conditional  laws. 

2.1.  Statement  of  results. 

For  convenience  we  introduce  some  notation.  U  will  denote  a  random  variable  (defined 
on  the  underlying  probability  space)  that  is  uniformly  distributed  over  [0, 1].  Also  G(-  |  Q) 
will  denote  a  regular  conditional  distribution,  measurable  with  respect  to  the  sigma-field 
Q  under  consideration.  If  <?(•  |  Q)  is  such  a  distribution  on  Tld  we  define  its  conditional 
characteristic  function  as 

(2.1.1)  g(u  \G)=  f  exp (i(u,  x))G(dx  |  G) 

Here  (u,  x)  denotes  the  inner  product  of  the  vectors  u  and  x. 

Theorem  1.  Let  X  be  an  ^-valued  random  variable  defined  on  some  probability  space 
(ft,  S ,  P)  and  let  Q  be  a  countably  generated  sub-<r-field  of  S.  Assume  that  there  exists 
a  random  variable  U  that  is  independent  of  the  <r-field  Q  V  a(X).  (This  makes  the  prob¬ 
ability  space  rich  enough.)  Let  G( •  |.  Q)  be  a  regular  conditional  distribution  on  7ld  with 
conditional  characteristic  function  g( •  |  G)  as  defined  in  (2.1.1).  Suppose  that  for  some 
non-negative  numbers  A,  8  and  T  >  108d, 

(2.1.2)  f  £|.E{exp(;(«,  X))  |  S]  -  j(«  I  S)\du  <  A(2 T)d 
J\u\<T 

and  that 

(2.1.3)  25{(?((x  :  |x|  >  \t) |J?))  <  S. 

Then  there  exists  an  ^^-valued  random  variable  Y  on  (ft,  5,  P),  with  the  following 
properties: 

(2.1.4)  ^(‘  I  ^)  is  a  conditional  distribution  of  Y  given  G 
and 

(2.1.5)  P(|X  -  Y\  >  a)  <  a 
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where 


(2.1.6)  •  a  =  \6dT~1  logT  +  2X^Td  +  26^ . 


Theorem  1  is  equivalent  to  the  following  theorem  which  is  more  convenient  to  apply. 

Theorem  2.  Let  (£l,S,P)  be  a  probability  space  and  let  P, Q^'H  and  C  be  sub-cr-fields  of 
S  such  that  Q  V  H  C  P  C  C.  Assume  that  Q  and  H  are  countably  generated.  Let  X 
be  an  Revalued  random  variable  defined  on  (ft,  S,  P )  and  measurable  with  respect  to  C. 
Assume  that  there  is  a  random  variable  U  that  is  independent  of  C.  Let  G{-  |  Q)  be  a 
regular  conditional  distribution  on  %d  with  conditional  characteristic  function  g{-  |  Q)  as 
defined  in  (2.1.1).  Suppose  that  for  some  non-negative  numbers  X,6  and  T  >  108d 

(2.1.7)  /  £|£?{exp(t(ti,  X))  |  p}  -  g(u  \  Q)\du  <  A(2 T)d 

J\u\<T 

and  that  (2.1.3)  holds. 

Then  there  exists  an  ^-valued  random  variable  Y,  defined  on  (Q,  S,  P),  measurable 
with  respect  to  C  V  o(TJ)  such  that  (2.1.5)  and  (2.1.6)  hold  and  having  the  following 
property. 

(?(•  |  Q)  is  a  conditional  distribution  of  Y  given  QM'H.  In  particular, 

(2.1.8) 

Y  is  conditionally  independent  of  ‘H  given  Q. 

Remark  2.1.  Theorem  2  is  an  easy  consequence  of  Theorem  1.  Since  Q  V  H  C  T  we  can 
replace  in  (2.1.7)  T  by  Q  V  H.  This  follows  from  [2,  Lemma  2.6].  We  reinterpret  G(-  \  Q) 
as  G( •  |  Q  V7f).  Thus  we  can  apply  Theorem  1  with  QWH.  in  place  of  Q.  Notice  that  Q  V7f 
is  countably  generated  since  Q  and  H  are.  We  then  obtain  an  Revalued  random  variable 
Y  satisfying  (2.1.5)  and  (2.1.6)  and  such  that  (?(•  |  Q)  is  a  conditional  distribution  of  'F 
given  QVH. 

Remark  2.2.  The  following  non-symmetric  form  of  Theorem  1  may  prove  useful.  Assume 
the  hypotheses  of  Theorem  1  with  T  >  0  (only)  and  let  r  >  0.  Then  the  conclusion  of 
Theorem  1  remains  valid  with  (2.1.5)  and  (2.1.6)  replaced  respectively  by 


P{ \X  -  71  >  r)  <  a(r,  T)  +  2A ±Td  +  26 $ 
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where 


(2.1.9)  a(r,4)<  (  3(4'*)  ^  ',rT^'  lfr-T 

l  3<2T)' exp  (-£!”),  ifr>T. 

Theorem  2  can  be  reformulated  in  the  same  way.  Proofs  will  be  sketched  in  Section  2.2.3. 

Remark  2.3.  The  condition  Q  CT  cannot  be  omitted.  In  fact  we  shall  give  an  example  of 
a  random  variable  X  with  characteristic  function  m,  defined  on  ([0, 1  ),B,  A)  and  a  family 
{<7e(-  |  (?),  0  <  £  <  |}  of  conditional  characteristic  functions  with  respect  to  a  <r-field  Q 
with  the  following  properties:  For  all  u  €  [0, 1)  and  all  |u|  <  j 

|m(u)  -  ge(u  |  0)|  <  e, 

yet  any  random  variable  Ye  with  conditional  characteristic  function  E(exp(iuYe  \  Q)  = 
gt{u  |  G)  is  bounded  by  2  and  satisfies 

p(l*-y,l>i)  =  i. 

Thus  whereas  conditions  (2.1.3)  and  (2.1.7)  hold,  (2.1.6)  does  not.  The  example  is  as 
follows:  We  choose  X(u)  =  ri(u?),  the  first  Rademacher  function  (recall  ri(u>)  =  1  for 
0  <  w  <  b  rj( u)  =  -1  for  \  <  u  <  1),  Q  =  {<f>,  [0, 1),  [0,  i),  [|,  1)},  T  =  {<j>,  [0, 1)},  and 

</«(«  |  G)w  =  exp(i£2u)cosu  0  <  u  <  ^ 

=  exp(-ze2u)  cos  u  ^  <  u  <  1. 

Now  a  random  variable  Yt  with  conditional  characteristic  function  ge(- 1  Q)  must  be  of  the 
form  Yt  =  e2ri  +  r  where  r  =  1  on  some  sets  A  and  B,  say,  with  A  C  [0,  |),  B  C  [j,  1) 
and  \(A)  =  A (B)  =  j  and  r  =  — 1  on  (0,1)  \  (A  U  B).  In  other  words,  r  is  independent  of 
r*i  (and  of  Q). 

Remark  2.4.  We  conclude  the  dicussion  of  Theorems  1  and  2  with  the  following  observation. 
Let  X  and  Y  be  random  variables  which  are  almost  independent,  say 

(2.1.10)  |£c,'“x+,w  -  EeiuXEeivY | 
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is  small  for  all  ju]  <  T,  |u|  <  T;  suppose  that  X  is  bounded.  Then  according  to  [2, 
Lemma  2.2]  (=  Theorem  1  with  Q  —  (<f>,  ft))  there  exist  independent  random  variables  X * 
and  Y*,  close  to  X  and  Y  respectively  and  such  that  C{X*)  =  C(X),  C(Y*)  =  C(Y). 

Unfortunately,  in  general,  we  cannot  choose  X*  =  X.  In  other  words  the  following 
assertion  is  false:  There  exists  a  random  variable  Y*  independent  of  X,  close  to  Y  and  with 
C(Y*)  =  C(Y).  Let  0  <  e  <  1,  let  r  assume  the  values  +1  and  —1,  each  with  probability 
i  and  let  X  =  er  and  Y  =  r.  Then  for  all  |u|  <  e~*  and  all  v,  (2.1.10)  is  bounded  by 
e\.  But  any  Y*  independent  of  X ,  with  C(Y*)  =  £(F),  is  also  independent  of  Y  and  so 
P(\Y-Y'1>1)  =  1. 

Repeated  applications  of  Theorem  2  yield  the  following  result.  Note  that  the  existence 
of  one  random  variable  U  independent  of  Vi>o  ^mP^es  the  existence  of  a  whole  sequence 
{Uk ,k>l}  independent  of  V*>o 

Theorem  3.  Let  {X*,  k  >  1}  be  a  sequence  of  random  variables  with  values  in  7ldk,k  >  1, 
and  defined  on  some  probability  space  (ft,5,P).  Let  {Pi,  k  >  0}  be  a  non-decreasing 
sequence  of  sub-<r-fields  of  S  such  that  Xk  is  P* -measurable  for  each  k>  1.  Let  {Hk,  k  >  1} 
be  a  sequence  of  countably  generated  <r-fields  with  Hk  C  Pjt,  k  >  1,  and  let  Q  C  Po 
be  a  countably  generated  <r-field.  Assume  that  there  exists  a  random  variable  U  that  is 
independent  of  \/ i>0  p*.  For  each  k  >  1,  let  (?*(•  j  Q)  be  a  regular  conditional  distribution 
on  /R,di ,  measurable  with  respect  to  Q,  and  with  conditional  characteristic  function 

9k(u\Q)-  [  exp(i(u,x))Gk{dx  \G),  ueTldk. 

Suppose  that  for  some  non-negative  numbers  A*,  8k  and  2*  >  I08d* 

/  £|£{exp(i<u,A'*))  |  -gk(u  |  g)\du  <  Xt(2Tt)J‘ 

J\u\ <T„ 

and  that 

Then  there  exists  a  sequence  {F*,  k  >  1}  of  ^‘-valued  random  variables,  defined  on 
(ft,  S,  P )  with  the  following  properties: 


(2.1.11) 


Yk  is  Tk  V  o(U)  measurable  for  each  k  >  1, 


(2.1.12) 


and 


GJjfc(*  |  Qi)  is  a  conditional  distribution  of  Y*  given  Q  V  Ht- 1, 
in  particular,  Yk  is  conditionally  independent  of  Hk-i  given  Q, 


p(l^-ni>ajb)<^ 


where 

ak  =  ledkTj^1  log  Tk  +  2\\T^  +  2 4,  k  >  1. 

In  particular,  if  we  choose  inductively  Hk  =  cr(Yi,...  ,Yk),  k  >  1  then  {Y*, k  >  1} 
can  be  chosen  to  be  a  sequence  of  random  variables  conditionally  independent  given  Q. 

Remark  2.5.  If  Q  can  be  chosen  to  be  the  trivial  <r-field  then,  except  for  the  exponent  | 
on  6k,  Theorem  3  reduces  to  [2,  Theorem  1).  In  particular,  {Yk,  k  >  1}  is  a  sequence  of 
independent  random  variables  with  C(Yk)  —  Gk,  k  >  1. 

Remark  2.6.  We  want  to  spare  the  reader  a  complete  report  on  the  pitfalls  that  general¬ 
izations  of  Theorem  3  may  have,  except  for  this  one:  Let  {Gk,k  >  1}  be  a  sequence  of 
countably  generated  <r-fields  Gk  C  Fk,k  >  1.  The  proof  of  Theorem  3  still  works  if  we 
assume  that  Gk  and  gk  are  £/*_i -measurable  instead  of  ^-measurable.  If,  in  addition,  we 
set  Hk  :=  <t(Yi,  . . .  ,  Yk)  then  the  conclusion  of  (2.1.12)  reads: 

Gk(‘  I  £*-i)  is  a  conditional  distribution  of  Yk  given 
a(Yi , . . .  ,  Yk-i )  V  C/fc-i ,  in  particular,  Yk  is  condition¬ 
ally  independent  of  Y\, . . .  ,  Yk- 1  given  Gk- \,k  >  1. 

Unfortunately,  in  general,  this  does  not  specify  the  joint  distribution  of  the  sequence 
>  1}>  as  the  following  example  shows.  In  comparing  this  with  Remark  2.5  this 
paradoxically  seems  to  say  that  more  information  in  fact  yields  less  information.  Let 
No,  Nit  and  Ni  he  independent  standard  normal  random  variables,  let  0  <  p  <  1  and 
set  Y\  :=  pNo  +  (1  —  p2)$N\,  Y2  =  Nq  +  N2-  Let  G  =  <r(No).  Then  the  conditional 
distribution  |  Q)  of  given  G  is  normal  N(No,  1).  The  conditional  distribution  of  Y2 
given  G  V  o{Y\)  is  also  N(No,  1).  Thus  Y%  is  conditionally  independent  of  Y\  given  G-  Yet 
this  does  not  determine  the  joint  distribution  of  Y\  and  Y2  since  p  is  arbitrary. 
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Theorems  1,  2,  and  3  apply  to  a  wide  variety  of  dependence  structures  including  ran¬ 
dom  variables  which  satisfy  a  strong  mixing  condition.  In  the  following  three  theorems  the 
dependence  relation  is  more  restrictive  than  the  one  implicit  in  (2.1.7),  but  the  random 
variables  are  allowed  to  assume  values  in  Polish  space.  For  earlier  versions  see  [11,  Theo¬ 
rem  4]  and  its  history  given  there.  Given  a  Polish  space  ( B ,  m),  a  set  A  C  B  and  p  >  0 
we  write  =  {x  :  inf {m(x,y) :  y  €  A}  <  p}.  As  before,  U  is  a  random  variable,  defined 
on  the  underlying  probability  space,  that  is  uniformly  distributed  over  [0, 1].  Moreover, 
G( •  |  Q)  will  denote  a  regular  conditional  distribution  on  B ,  the  Borel  sigma-field  on  ( B ,  m), 
such  that  G{‘  |  Q)  is  measurable  with  respect  to  the  sigma-field  Q  under  consideration. 

Theorem  4.  Let  X  be  a  random  variable,  defined  on  some  probability  space  (Cl,  S ,  P)  and 
with  values  on  some  Polish  ( B,m ).  Let  Q  be  a  countably  generated  sub-sigma  field  of 
S  and  assume  that  there  exists  a  random  variable  U  that  is  independent  of  the  cr-field 
Q  V  o(X).  Let  G(-  |  Q)  be  a  regular  conditional  distribution  on  B  and  suppose  that  for 
some  non-negative  numbers  a  and  /3 

(2.1.13)  E  sup  {P(X  6  A  |  Q)  -  G(A°'  |  £)}</?. 

Then  there  exists  a  random  variable  Y  with  values  in  B,  defined  on  (Cl,  S ,  P)  and  satisfying 
(2.1.4)  and 

(2.1.14)  P{m(X,Y)  >  a)  <  p. 

Remark  2.7.  Notice  that  here  as  well  as  in  the  following  two  theorems  the  constants  are 
sharp.  Moreover,  if  in  (2.1.13)  Q  is  the  trivial  <r-field  then  Theorem  4  reduces  to  the 
Strassen-Dudley  theorem.  . 

Theorem  4  is  equivalent  to  Jhe  following  theorem. 

Theorem  5.  Let  (Cl,  S,  P)  be  a  probability  space  and  let  T,  Q,  H  and  C  be  sub-sigma-fields 
of  S  such  that  QVH  C  F  C  C.  Assume  that  Q  and  H  are  countably  generated.  Let  X  be 
a  random  variable,  defined  on  (Cl,S,P)  and  with  values  in  some  Polish  space  (B,m),  and 
measurable  with  respect  to  £.  Assume  that  there  exists  a  random  variable  U  independent 
of  C.  Moreover,  let  (?(•  |  Q)  be  a  regular  conditional  distribution  on  B  and  suppose  that 
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for  some  non-negative  numbers  a  and  /? 


E  sup  {p(XeA\F)-G(Aal\G)}<P- 

Then  there  exists  a  random  variable  Y  with  values  in  B,  defined  on  (ft,  £,  P),  measurable 
with  respect  to  C  V  cr(U)  and  such  that  (2.1.8)  and  (2.1.14)  hold. 

Repeated  applications  of  Theorem  5  yield  the  following  result. 

Theorem  6.  Let  {£*,771*,  it  >  1}  be  a  sequence  of  Polish  spaces,  let  Bk  denote  the  Borel 
field  of  I?*,  and  let  {X^k  >  1}  be  a  sequence  of  random  variables,  defined  on  (ft,<S,P) 
and  with  Xk  assuming  values  in  B*.  Let  {P*,&  >  0}  be  a  non-decreasing  sequence  of 
sub-cr-fields  of  S  such  that  Xk  is  Pit-measurable  for  each  k  >  1.  Let  {Hk,k  >  1}  be 
a  sequence  of  countably  generated  cr-fields  with  Hk  C  Pit,  k  >  1,  and  let  Q  C  Pb  be  a 
countably  generated  <r-field.  Assume  there  exists  a  random  variable  U  that  is  independent 
of  Vjt>0Pjt*  For  each  k  >  1,  let  <?*(•  |  Q)  be  a  regular  conditional  distribution  on  5*, 
measurable  with  respect  to  Q.  Suppose  there  exist  two  sequences  of  real  numbers  {a*} 
and  {/?*}  such  that  for  all  k  >  1 

E  imp  {P( Xk  €  A  I  *-,)  -  Gt(. 4"‘l  |  <?)}  <  h. 

AGBk 

Then  there  exists  a  sequence  {Yk,k  >  1}  of  random  variables,  defined  on  (ft,«S,  P)  and 
with  Yy  assuming  values  in  Bk  such  that  (2.1.11)  and  (2.1.12)  hold.  Moreover,  for  all 
k>  1, 

P{m*(**,n)  >«*}</?*. 

In  particular,  if  we  choose  Hk  —  ^(Fi,...  ,yi)  for  ^  >  1  then  {Yk}k  >  1}  can  be 
chosen  to  be  a  sequence  of  random  variables  conditionally  independent  given  Q. 

Remark  2.8.  In  a  recent  paper  (7]  Eberlein  embarks  on  a  project  similar  to  ours,  namely 
to  establish  conditions  for  the  approximation  of  a  given  sequence  {Xk,k  >  1}  by  another 
(possibly  dependent)  sequence  {y*,  k  >  1}  of  prescribed  distribution.  In  our  view  Eber- 
lein’s  attempt  has  failed.  For  he  imposes  conditions  on  the  sequence  {A%,  k  >  1}  so  strong 
that  these  guarantee  that  >1}  can  be  approximated  by  a  sequence  {V*,  k  >  1} 
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of  independent  random  variables,  a  case  which  is  entirely  in  the  domain  of  attraction  of 
previous  work  (Section  2.4). 

2.2.  Proof  of  Theorem  1. 

The  following  lemma  gives  a  random  variable  Z  for  which  G(-  |  Q)  is  the  conditional 
distribution  of  Z  given  Q.  Thus  from  the  class  of  all  such  random  variables  Z  we  are  to 
choose  one,  say  Y ,  which  in  addition,  satisfies  (2.1.5)  and  (2.1.6). 

Lemma  2.2.1.  Let  (ft,  5,  P)  be  a  probability  space  with  a  sub-c-field  Q  C  T.  Let  (?(*,  w)  be 
a  ^-measurable,  regular  conditional  distribution  on  a  Polish  space  S.  Let  U  be  a  random 
variable  uniformly  distributed  over  [0, 1]  and  independent  of  Q.  Then  there  exists  an  S- 
valued  random  variable  Z  such  that  G(-,u> )  is  a  regular  conditional  distribution  of  Z  given 

G- 

Proof.  Without  loss  of  generality  we  can  assume  that  S  =  [0, 1]  with  the  usual  metric  and 
Borel  structure.  (See  e.g.  the  proof  of  [5,  Lemma  2.11].)  For  0  <  u  <  1  define 

.  G~l{u,u)  —  inf{<  :  G(t,'u )  >  u}. 

Then  G~l  is  jointly  measurable  since  the  map  u  — ►  G~x  (u,  10)  is  left-continuous  and  since 
for  fixed  u  and  t 

{w  :  G~l{u,w)  <  t}  =  {w  :  G(t,u)  >  u}  €  Q. 

The  desired  random  variable  is  given  by 

Z(u)  :=  G"1^),^). 

We  will  also  make  extensive  use  of  the  following  two  lemmas. 

Lemma  2.2.2.  ([5,  Lemma  2.11]).  Let  5  and  T  be  Polish  spaces  and  Q  a  law  on  S  ®  T, 
with  marginal  pi  on  5.  Let  (ft,  5,  P)  be  a  probability  space  and  X  be  a  random  variable 
on  J!  with  values  in  5  and  law  C(X)  =  p.  Assume  that  there  is  a  random  variable  U  on  ft, 
independent  of  X,  with  values  in  a  separable  metric  space  V  and  law  C(U)  on  V  having  no 
atoms.  Then  there  exists  a  random  variable  Y  on  ft  with  values  in  T  and  C((X,  Y))  =  Q. 
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Lemma  2.2.3.  ([2,  Lemma  Al]  =  [5,  Lemma  2.13]).  Let  X,Y  and  Z  be  Polish  spaces. 
Suppose  fi  is  a  law  on  X  <g>  Y  and  v  a  law  on  Y  ®  Z  such  that  y  and  v  have  the  same 
marginal  on  Y .  Then  there  is  a  law  on  X  ®  Y  ®  Z  with  marginals  y  on  X  ®  Y  and  v  on 
Y®Z. 

Combining  these  two  lemmas  we  obtain 

Lemma  2.2.4.  Let  R ,  S  and  T  be  Polish  spaces  and  let  v  be  a  law  on  5  ®  T.  Let  (ft,  5,  P) 
be  a  probability  space  and  let  X  and  Y  be  random  variables  with  values  in  R  and  S 
respectively,  such  that  C{Y)  is  the  marginal  of  v  on  S.  Assume  that  there  is  a  random 
variable  U  on  (ft,  5,  P),  uniformly  distributed  over  [0, 1]  and  independent  of  Y.  Then  there 
exists  a  random  variable  Z  on  (ft,$,P)  such  that  £((F,  Z))  =  u. 

Combining  Lemma  2.2.2  with  [2,  Lemma  2.2]  and  the  Strassen-Dudley  theorem  [4, 
Theorem  11.6.2]  we  obtain 

Lemma  2.2.5.  Let  X  be  a  random  variable  with  values  in  7ld  and  characteristic  function 
/.  Let  G  be  a  distribution  on  Rd  with  Fourier  transform  g.  Assume  there  exists  a  random 
variable  U  independent  of  X.  Then  there  exists  a  random  variable  Y  with  distribution  G 
such  that 

P(\X  -Y\>a)<a 

where 

a  ~  (—)d  /  l/(u)  "  Ku)ku  +  G(x  '■  M  ^  t;T)  +  1 6dT-1  logT 

*  «/|tt|<T  1 

provided  that  T  >  108d. 

2.2.1.  The  discrete  case. 

In  this  section  we  make  heavy  use  of  the  ideas  developed  in  [2,  Section  2.3.1].  We 
first  prove  Theorem  1  under  the  additional  hypothesis  that  Q  is  generated  by  a  countable 
partition. 

Let  e  >  0  to  be  chosen  suitably  later.  By  (2.1.2),  Fubini’s  theorem  and  Markov’s 
inequality 

(2.2.1)  f  |£{exp(i{u,  X))|S)  -  9(u  1 5)\du  <  c(2T)d 

J\u\<T 
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except  on  a  set  A\  €  Q  with  P(A\)  <  j.  Similarly,  by  (2.1.3) 


(2.2.2) 


G(x:\z\>lT)<6* 


except  on  a  set  Ai  G  Q  with  P(Aj)  <  8%.  Put  rj  =  j  +  6%  and  let  A  =  Ai  U  Ai.  Then  the 
exceptional  set  A  6  Q  and  has  probability  P(A)  <  tj. 

Let  D  be  any  of  the  countably  many  atoms  of  Q  and  keep  it  fixed.  Let  denote 
the  trace  of  S  on  D  and  define  Pd  by 


(2.2.3) 


Pd(E)  =  P{E  |  D),  Ee  S(D). 


Note  that  XI d  and  UId  are  still  P/j-independent  and  that  the  Pd -distribution  of  UId  is 
still  uniform.  On  D  the  conditional  characteristic  function 

E  {exp(i(u,  X))  |  Q)  =  — exp (i(u,X))dP  =  /(u),  say, 

is  a  non-random  function  in  u  and  can  be  interpreted  as  the  Fourier  transform  of  the  Pq- 
distribution  of  X.  Similarly,  on  D ,  the  conditional  characteristic  function  g(u  |  Q)  as  well 
as  G(-  |  Q)  are  non-random.  We  denote  them  by  g(u)  and  G(-)  respectively. 

Thus,  on  the  set  D,  either  both  (2.2.1)  and  (2.2.2)  hold,  in  which  case  D  C  Ac,  or 
else  one  of  these  two  conditions  fails,  in  which  case  D  C  A.  Assume  first  that  D  C  Ac . 
Then  by  (2.2.1)-(2.2.3) 

/  |/(«)  -  $(u)|du  <  e(2T)d 

J\u\<T 

and 

G(x  :  |x|  >  jT)  <  Si. 

Hence  by  Lemma  2.2.5  there  exists  a  random  variable  Y  on  (D,S^d\Pd)  such  that 


(2.2.4) 


Pd(Y  €  B)  =  G(B),  B  6  7Zd 


and  that 


(2.2.5) 


Pd(\X-Y\>0)<0  \{  D  C  Ac 
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where 


(2.2.6)  0  =  1  SdT~l  log  T  +  eT2i  +  6 * . 

If  on  the  other  hand  DcAwe  choose  Y  with  P^-distribution  G  but  arbitrary  otherwise. 
Thus 

(2.2.7)  PD{ \X  -  Y |  >  0)  <  1  if  D  c  A. 

As  D  runs  through  all  the  atoms  of  Q  we  obtain  a  random  variable  Y  defined  on 
the  whole  space  (Q,S,P)  such  that  the  conditional  law  C(Y  |  Q)  =  G{-  |  Q).  Moreover, 
summing  the  relations  (2.2.5)  and  (2.2.7)  over  all  D  6  Q  we  obtain  by  (2.2.4) 

P(\X-Y\>0)<0  +  V. 

We  choose  e  =  \*T~d  and  obtain  in  view  of  (2.2.6)  a  result  slightly  stronger  than 
claimed  in  (2.1.5)  and  (2.1.6). 

2.2.2.  The  general  case. 

Since  Q  is  countably  generated  there  exists  a  real- valued  random  variable  W  such  that 
Q  =  cr(W).  For  n  =  1, 2, . . .  let  Wn  denote  the  discrete  random  variable  defined  by 

Wn  :=  k2~nl{k2~n  <W<( k  +  l)2~n] 

— oo<Jb<oo 

and  let  Qn  =  a(Wn).  Let  G(-  |  Qn)  denote  the  ^-measurable  regular  conditional  distribu¬ 
tion  defined  by 

G(B  |  gn)  =  E{G(B  j  Q)  |  Gn }  a.s. 

for  B  €  %d.  (Note  that  the  verification  of  this  as  well  as  of  several  of  the  following 
claims  is  particularly  easy  if  Lemma  2.2.1  is  used.)  Let  g(u  |  G„)  denote  the  corresponding 
conditional  characteristic  function 

9(u  |  Gn)=  [  exp (i{u,x))G(dx  |  Gn )  =  E{g(u  |  G)  \  Gn}. 
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Since  Qn  C  G,  [2,  Lemma  2.6]  shows  that  conditions  (2.2.1)  and  (2.2.2)  are  satisfied  with 
Gn  taking  the  place  of  G-  By  the  result  of  Section  2.2.1  there  exists  an  Tt^-valued  random 
variable  Yn  such  that 

G( •  |  Gn)  is  a  conditional  distribution  of  Yn  given  Gn 


and 


(2.2.8) 


P(\X-Yn\>a)<a 


where  a  is  given  in  (2.1.6). 

We  now  show  that  the  sequence  {C(W,  Yn),  n  >  1}  of  joint  laws  of  W  and  Yn  converges 
weakly  as  n  — >  oo.  To  see  this  first  note  that  for  j  —  0,  ±1,  ±2, . . . ,  and  k  =  1,2,...  the 
events 

{We\j2-t,u+ 1)2-*)}  se*. 

Thus  for  each  dyadic  interval  /*  of  rank  k,  for  all  n  >  k  and  for  all  B  6  1ld 


(2.2.9) 


P(W  €  h,Yn  6  B)  =  [  G{B  |  Gn)dP 
=  [  G(B  |  G)dP. 


This  proves  the  claim.  It  follows  that  the  sequence  {£(X,  W,  Yn),  n  >  1}  is  a  tight  family 
of  probability  measures  on  7l2rf+1.  Hence  there  exists  a  subsequence  {n'}  such  that 

£(X,W,Yn.)  =>  Q 

for  some  probability  measure  Q  on  7^2rf+1.  Since  £(X,  W)  is  a  marginal  of  Q  it  follows  from 
Lemma  2.2.2  that  there  exists  an  ^-valued  random  variable  Y  such  that  C(X,  W,  Y)  =  Q. 
(2.2.9)  implies  that  G(-  |  G)  is  a  conditional  distribution  of  Y  given  W ,  and  (2.2.S)  implies 

P( \X  -Y\>a)<  liminf  P(\X  -  Yn\  >  o)  <  a, 

n— *oo 


2.2.3.  Proof  of  Remark  2.2.  The  following  lemma  and  its  proof  are  minor  modifications 
of  [2,  Lemma  2.2]  and  the  proof  given  there. 
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Lemma  2.2.6.  Let  X  be  an  7^-valued  random  variable  with  distribution  F  and  characteris¬ 
tic  function  /.  Let  g  be  a  characteristic  function  on  %d.  Moreover,  suppose  that  there  is  a 
random  variable  U,  uniformly  distributed  over  [0, 1]  and  independent  of  X.  Let  r  and  T  be 
positive  numbers.  Then  there  exists  a  (Hd- valued)  random  variable  Y  with  characteristic 
function  g  such  that 

P(\X  -  Y\  >  r)  <  (£)  j  | /(«)  -  sMIrfu  +  fXH  >  jT)  4  «(r,  T) 

where  a(r,  T)  is  defined  in  (2.1.9). 

Proof.  We  follow  the  proof  of  [2,  Lemma  2.2]  until  [2,  (2.2.4)].  KG  denotes  the  distribution 
associated  with  g  then  by  the  argument  proving  [2,  (2.2.1)]  we  obtain 

nB)<G(B')+(Z)  f  l/W-sWMa  +  fdxl  >i T) 

'  \*y  J |««|<r  ^ 

4  ^(1*1  >  \t)  4  |x|  >  |r)  4  (0  j  \h(n)\iu 

for  all  Borel  sets  B  C  'R.d.  We  choose  H  as  on  [2, p.  36]  with  a1  =  if  r  <T  and  <r2  as  1 
if  r  >  T.  We  then  apply  the  Strassen-Dudley  theorem  and  Lemma  2.2.2  and  obtain  the 
result. 

To  finish  the  proof  of  Remark  2.2  we  follow  Section  2.2.1  until  (2.2.4).  We  now  apply 
Lemma  2.2.6  and  obtain,  instead  of  (2.2.5) 

P{\X  -  Y\)  >r,D)<  P{D)(Tue  +  A*  +  a(r,r»  if  D  C  4C. 

As  in  Section  2.2.1  we  sum  over  all  D  €  Q,  choose  £  =  A $T~d  and  obtain  the  result.  The 
changes  in  Section  2.2.2  are  minor. 

2.3.  Proof  of  Theorem  4. 

Again  we  first  prove  Theorem  4  under  the  additional  hypothesis  that  Q  is  generated 
by  a  countable  partition.  The  proof  makes  use  of  sketches  of  proofs  of  unconditional  results 
given  in  several  earlier  papers  (See  [10,  Theorem  3.4]  and  its  history  given  there). 
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Let  D  be  any  of  the  countably  many  atoms  of  Q  and  note  that  on  each  D  both 
P(X  e  A  |  Q)  =  P(X  e  A  J  D)  and  G(Aa I  |  Q)  =  G(Aa I  |  D ), 
are  non-random.  Hence  we  can  rewrite  (2.1.13)  in  the  form 

(2.3.1)  Y,  P(DXD)  2  fi 

Deo 

where  we  set 

(2.3.2)  e(D)  =  sup(P(X  <E  A  |  Z>)  -  G(Aa 1  |  D)). 

A€B 

In  the  context  of  [10,  Theorem  3.4]  the  usefulness  of  this  observation  for  obtaining  sharp 
constants  was  pointed  out  to  us  by  Erich  Berger  [1].  We  thank  him  for  this  remark. 

For  the  moment  keep  D  fixed.  We  shall  construct  Y  on  each  D  separately.  Define  for 
all  A  €  H 

P\{A)  =  P(X  €  A  |  D)  and  P2(A)  =  G{A  |  D) 

Then  by  (2.3.2)  with  e  =  e(D) 

Pi  (A)  <  P2(AqJ)  +  e,  for  all  A  e  B. 

Hence  by  the  Strassen-Dudley  theorem  [4,  Theorem  11.6.2]  there  exists  a  probability 
measure  Q  —  Qd  on  B®B  with  marginals  Pi  and  P2  such  that 

QD{{x,y ) :  m(x,y)  >  a}  <e. 

Hence  by  Lemma  2.2.2  there  exists  a  random  variable  Y  on  {D,S^d\Pd)  such  that 
£(X,  Y)  =  Qd,  where  and  Pd  are  defined  in  (2.2.3)  above.  It  follows  that 

(2.3.3)  P(m(Z,  Y)  >  a,  D)  <  e(D)P(D). 

As  D  runs  through  all  atoms  of  Q  we  obtain  a  random  variable  Y  defined  on  the  whole 
space  (fi,«S,P).  We  sum  (2.3.3)  over  all  sets  D  and  obtain,  in  view  of  (2.3.1) 

P(m(X,Y)  >  a)  <  Y  P(D)e(D )  <  0. 

Deo 
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We  also  note  that  (2.1.4)  holds  since  P2 ,  the  second  marginal  of  Qp,  is  the  P^-distribution 
of  Y.  This  proves  Theorem  4  in  case  that  Q  is  generated  by  a  countable  partition. 

The  proof  of  the  general  case  can  be  easily  modeled  after  Section  2.2.3. 

2.4.  Proof  of  Remark  2.8. 

We  concentrate  only  on  one  of  Eberlein’s  results,  namely  on  [7,  Theorem  2].  We  first 
prove  the  following  lemma. 

Lemma  2.4.1.  Let  X  and  W  be  random  variables  defined  on  some  probability  space  (ft,  S,  P ) 
and  with  values  in  a  Polish  space  B.  Let  T  and  Q  be  sub-cr-fields  in  S  and  assume  that  T 
is  non-atomic.  Suppose  there  exist  two  positive  numbers  e  and  A  such  that  for  each  pair 
of  sets  D  G  T  and  E  6  Q  with  P(D)  =  P{E)  the  following  relation  holds: 

(2.4.1)  P(X  £A\D)<  P(W  €  |  E)  +  A,  for  all  A  6  B. 

Here  B  denotes  the  Borel-field  of  B.  Then  with  probability  one 

(2.4.2)  sup  {P{W  eA\Q)-  P(W  e  A2*})}  <  3A. 

A€B 

Proof.  Let  E  €  Q  be  any  set  and  let  a  :=  P(E)  >  0.  Choose  integers  n  >  ^  and  0  <  k  <  n 
such  that  0<a  —  ^<A<  arA;  Partition  ft  into  n  sets  . . .  ,Dn  6/  with  P(Dj )  =  L, 
1  <  <  n.  For  any  subset  M  of  k  integers  j,  1  <  j  <  n  choose  a  set  D*M  6  T  disjoint 

from  UjgAf  Dj  with  P{D*M )  =  a  —  K  By  a  well-known  argument  (2.4.1)  implies 

P(W  €  A  |  E)  <  P(X  e  Ae J  |  D)  +  A,  for  all  A  6  B 

and  so  with  D  —  (Jj€W  Dj  U  D*M 

(2.4.3)  P{W  6  A,E)<J^  P(X  €  A*\Dj)  +  2aA,  AeB 

j€M 

We  sum  (2.4.3)  over  all  (£)  possible  subsets  M  emd  obtain 

(3 nw  €  A, E)<k-  Qp(X  6  +  2aX  (") ,  .4  6  S 
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Dividing  by  a(£)  and  applying  (2.4.1)  with  D  =  E  =  ft,  and  instead  of  A,  we  get 

(2.4.4)  P(W  eA\E)<P(W  e  A2el)  +  3A  for  all  A  €  B. 

Now  fix  A  6  B  and  let  E  =  {P(W  G  A  j  Q)  —  P(W  6  A2el)  >  3A}.  Then  (2.4.4)  implies 
P(E)  =  0.  Since  the  supremum  on  the  LHS  of  (2.4.2)  needs  to  be  extended  only  over 
countably  many  sets  A  6  B  we  obtain  the  result. 

We  now  recall  Eberlein’s  [7,  Theorem  2]:  Let  {Bk->  m*,  k  >  1}  be  a  sequence  of  Polish 
spaces,  let  (JSTjt,  A:  >  1),  {Wk,k  >  1}  be  two  sequences  of  random  variables,  defined  on 
(ft,«S,P)  and  with  X*  and  Wk  assuming  values  in  Bk,k  >  1.  Let  {Pit,  A:  >  1}  and 
{£*,  Jfc  >  1}  be  two  non-decreasing  sequences  of  sub-<7-fields  of  S  and  assume  that  p*  is 
non-atomic,  Xk  is  Pi-measurable  and  Wk  is  ^-measurable  for  each  k>  1.  Suppose  there 
exist  sequences  {e*,  k  >  1],  {A*,  k  >  1}  of  positive  numbers  such  that  for  each  pair  of  sets 
D  e  Fk-u  E  e  Gk- 1  with  P(D)  =  P(E), 

(2.4.5)  P(Xk£A\D)<p{wkeAe^\E^+Xk  for  all  A  €  Bk 

Here  Bk  is  the  Borel  or-field  over  Bk.  Let  us  finally  assume  that  there  exists  a  random 
variable  U,  uniformly  distributed  over  [0, 1]  and  independent  of  Pi©  V  Goo-  Under  these 
assumptions  Eberlein  [7]  proves  that  there  exists  a  sequence  {Zk,  k  >  1}  with  the  same 
law  as  {Wk,  k  >  1}  such  that 

(2.4.6)  P{mk(Xk,  Zk)  >  3s*}  <  A*,  k  >  1. 

What  we  claim  is  that  under  these  hypotheses  one  can  do  better.  Namely,  one  can 
approximate  {X*,  k  >  1}  by  a  sequence  {Yi,  k  >  1}  of  independent  random  variables  with 
C(Yk)  =  C(Wk),  for  each  k  >  1. 

To  see  this  note  that  by  Lemma  2.4.1  we  have  with  probability  1 

P(Wk  €  A  |  Gk- 1)  <P(wke  A2**))  +  3A*,  AeBk 

and  hence  by  Theorem  6  with /Hk-i  =  ,Yk-i),k  >  1  and  G,  the  trivial  <r-field, 

there  exists  a  sequence  {Yfc*,  k  >  1}  of  independent  random  variables  with  £(Y*  )  =  C(Wk), 
k  >  1  such  that 

(2.4.7)  PM***,  Wk)  >  2ek }  <  3A*,  A  €  Bk. 


21 


As  a  matter  of  fact  [10,  Theorem  3.4]  would  in  essence  yield  the  same  conclusion. 

Let  B  —  ®Bk.  Consider  the  law  £({Wjt},  {Yt*})  on  B®B.  Since  £({Wjfc})  =  £({Z*}) 
we  obtain  from  Lemma  2.2.4  a  random  variable  {Y*,  k  >  1}  such  that  £({Z*},  {Y*})  = 
£({Wfc}>{^fc*})-  Hence  by  (2.4.6)  and  (2.4.7)  we  obtain 

(2.4.8)  P{mk(Xk,Yk)  >  5ek}  <  4A*,  k  >  1. 

Since  £({Y*})  =  £({Yfc* })  the  sequence  {Yk,k  >  1}  is  a  sequence  of  independent 
random  variables  with  £(Y*)  =  £(W*),  k>  1. 

We  would  like  to  add  in  passing  that  the  existence  of  a  sequence  {Y*,  k  >  1}  of  inde¬ 
pendent  random  variables  approximating  the  sequence  {X*,  k>l}  and  satisfying  (2.4.8) 
can  be  derived  directly  and  more  easily  from  (2.4.5)  by  using  Theorem  6  or  [10,  Theo¬ 
rem  3.4]  under  the  additional  hypothesis  that  one  of  the  Qk's  contains  sets  of  arbitrarily 
small  measure.  The  argument  is  similar  to  the  one  given  in  the  proof  of  Lemma  2.4.1. 
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3.  A  strong  approximation  theorem  for  Hilbert  space  valued 
martingales. 

3.1.  Statement  of  theorem. 

Let  {xn,£„,n  >  1}  be  a  square  integrable  martingale  difference  sequence  defined 
on  some  probability  space  (ft,$,P)  and  with  values  in  a  real  separable  Hilbert  space 
(H,  (*,  •),  |  •  |).  Suppose  that  (ft,  S ,  P)  supports  a  random  variable  U,  uniformly  distributed 
over  [0, 1]  and  independent  of  {xn,  n  >  1).  We  denote  the  conditional  expectation  operator 
E{-  |  £„_ i)  by  En(‘).  Let  cn  be  the  conditional  covariance  operator  of  xn  given  £n-i, 
defined  by 

ffn(u)  :=  En({u , x„)xn),  ueH 

and  let 

ir{<rn)  :=  =  En |x„|2 

i>l 

be  its  trace.  Here  {e,-,  i  >  1}  is  a  complete  orthonormal  basis  for  H.  We  write 

An  :=  ^(Ti 

*<n 

and  put 

Vn-.=  tr(A„)  =  YJEi\xi\‘. 

t  <n 

For  each  a;  6  ft  let  fiw  be  a  mean  zero  measure  on  H  such  that  JH  \x\2fiw(dx )  <  oo. 
Moreover,  suppose  that  the  map  T :  H  x  ft  -*  H  defined  by 

=  /  (u,x)x/iu,(dx)  u€if,o;€ft 
JH 

is  measurable.  We  call  T  a  random  convariance  operator.  We  define  further  a  seminorm 
||  •  ||  on  linear  operators  B  :  H  -*  H  by 

||B||=  sup  |<S(u),tt)| 
uew,||u||=i 

and  observe  that  if  T  is  a  random  covariance  operator  then  ||T||  is  a  random  variable. 
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With  this  notation  we  have  the  following  extension  of  [9,  Theorem  1]  and  of  [11, 
Theorem  2]. 

Theorem  7.  Let  {z„,  £n,  n  >  1}  be  a  square-integrable  martingale  difference  sequence  with 
values  in  a  real  separable  Hilbert  space  H  of  dimension  d  <  oo,  and  defined  on  (ft,«S,P). 
Let  /  be  a  non-decreasing  function  with  f(x )  — ►  oo  as  x  — *  oo,  and  such  that  js 

non-increasing  for  some  a  >  50 d.  (If  d  =  oo  we  interpret  this  last  condition  to  mean  that 
it  holds  for  all  large  a.)  Suppose  that  Vn  — ) ►  oo  a.s.  and  that 

(3.1.1)  D  :=  £  £{|*„|21  {|i„|2  >  /(V»)}  //(V„)}  <  oo. 

TI>1 

Moreover,  suppose  that  there  exists  some  covariance  operator  A,  measurable  with 
respect  to  £*  for  some  k>  0,  and  some  0  <  p  <  1  such  that 


(3-1.2)  ^sup{pn  -  AVn\\/f(Vn)y  <  oo. 

n>  1 


Finally,  let  I  be  an  arbitrary  non-singular,  non-random  covariance  operator. 


Then  there  exists  a  sequence  {yn,  n  >  1}  of  i.i.d.  Gaussian  H- valued  random  variables, 
defined  on  (f2, 5,  P),  with  mean  zero  and  covariance  operator  I,  and  independent  of  A  such 
that  with  probability  1 


n>l  m<< 


o  (t*(/(0/*)p/5M)  if  d  <  oo 
o((<loglog<)*)  if  d  =  oo. 


Remark  3.1.  For  d  =  1  our  result  is  somewhat  weaker  than  Strassen’s  [12]  because  our  class 
of  functions  is  somewhat  smaller.  Moreover,  instead  of  (3.1.1),  Strassen  only  assumed  the 
almost  sure  convergence  of  the  series  in  (3.1.1)  with  E(-)  replaced  by  E(-  |  £n-i). 

Remark  3.2.  Collecting  the  probability  bounds  before  the  Borel  Cantelli  lemma  is  applied 
we  can  obtain  for  d  <  oo 


y~!  Xnl{Vi,  <  f}  —  ^  ym 

«>1  m<< 


>(i  (/(()/<)  ^ 


« (/(<)/<)  *  • 


Here  { ym ,m  >  1}  is  a  sequence  of  i.i.d.  W"(0 ,/)  random  vectors  independent  of  A ,  and  I 
denotes  the  identity  matrix. 
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Remark  3.3.  Influenced  by  Levy’s  proof  of  the  CLT  for  martingale  differences  (See  e.g.  [3, 
498-501])  one  of  our  initial  goals  was  to  establish  strong  approximations  of  the  type 


(3.1.3) 


or 


k<n  k<n 


=  o((VnloglogV’n)i) 


a.s. 


(3.1.4)  |^*fcl{V*<t}-^<7|y*l{Vifc<t}|  =  o((floglogt)*)  a.s. 

where  {yk,  k  >  1}  is  a  sequence  of  i.i.d.  standard  normal  random  variables,  independent  of 
the  sequence  {a*,  k  >  1}.  But  even  if  d  =  1  neither  (3.1.3)  nor  (3.1.4)  can  hold  in  general. 
To  see  this  let  {rk,k  >  1}  be  the  sequence  of  Rademacher  function,  i.e.,  P(rk  =  ±1)  =  |, 
C\ fc  =  <r(ri,...  ,r*)  and  xk  =  (1  +  rjfc_i)r*.  Then  ak  =  E{x\  |  A-i)  =  1  +  rk-i.  Write 
as  above  Vn  =  <rk •  If  {yjt}  is  independent  of  {<7*},  then  {y*}  is  also  independent  of 

{rk}  and  so 


(3.1.5)  xk  -  a\yk  -  <r^(rk  -  yk),  k>  1 

is  a  martingale  difference  sequence  with  respect  to  the  natural  filtration.  Hence  (3.1.5) 
satisfies  the  LIL  with  quadratic  variation  2Vn.  This  contradicts  both  (3.1.3)  and  (3.1.4). 

Of  course  if  d  =  1  we  obtain  for  some  i.i.d.  «V(0, 1)  sequence  {yj,j  >  1} 


xki{vk  <t}-Y,yj  =  °  a-s- 

j<t 

by  Strassen’s  theorem  [12]. 

[11,  Theorem  1]  and,  a  fortiori,  [6,  Theorem  1]  easily  extend  to  the  case  of  random 
covariance  operators  T. 

Theorem  8.  Let  {£j,j  >  1}  be  a  sequence  of  random  variables,  defined  on  (ft,5,P),  with 
values  in  a  real  separable  Hilbert  space  H  of  dimension  d  <  oo  and  with 


for  some  6  >  0.  Let  U  be  a  random  variable  independent  of  j  >  1}  and  let  {Mj,j  >  1} 
be  a  non-decreasing  sequence  of  (7-fields  such  that  (j  is  M  j-measurable  for  each  j  >  1. 
Denote 

m+n 

Sn(m):=  ^  6 

j-m+l 

and  for  m  >  0  and  n  >  1  define  the  conditional  covariance  operators  C(n,  m)  by 
C(n,  m;  u)  :=  £{(u,  Sn(m))Sn(m)  |  Mm),  u  e  H. 

Suppose  that  there  exist  9  >  0  and  p  >  0  such  that  uniformly  in  m  >  0 

£|B(S»(m)  |  Mm)  f 

and  suppose  that  there  exists  a  (possibly  random)  covariance  operator  T,  measurable  with 
respect  to  some  Mj,  j  >  1  such  that  uniformly  in  m  >  0 

-E?||C7(n,  m)  —  nT||  <  nl~6 . 

Finally,  let  7  be  an  arbitrary  non-singular,  non-random  covariance  operator. 

Then  there  exists  a  sequence  {y„,  n  >  1}  of  i.i.d.  Gaussian  7T- valued  random  variables, 
defined  on  (Q,5,P)  with  mean  zero,  covariance  operator  7,  and  independent  of  T  such 
that  with  probability  1 

0  if  d  <  oo  • 

o  ^(nloglogn)ij  ifd=oo. 

Here  A  >  0  is  a  constant  depending  only  on  d ,  6,p  and  9. 

3.2.  Proof  of  Theorem  7. 

The  proof  of  Theorem  7  follows  in  essence  the  proof  of  [9,  Theorem  1]  except  that  for 
the  construction  of  the  random  variables  {yy,  j  >1}  Theorem  3  instead  of  [2,  Theorem  1] 
will  be  applied.  Throughout  the  proof  we  shall  use  the  same  notation  as  in  [9],  wherever 
possible. 

We  first  observe  that  there  is  no  loss  of  generality  in  assuming  that  A  is  £o -measurable. 


j<n  j<n 


26 


3.2.1.  The  case  d<  oo. 


Except  for  one  minor  change  we  follow  [9,  Section  2.1-2.3].  Starting  with  [9,  (2.1)]  we 
replace  d  by 


This  will  compensate  for  the  fact  that  our  hypothesis  (3.1.8)  is  weaker  than  the  corre¬ 
sponding  [9,  (1.5)].  The  changes  in  [9,  Section  2.3]  precipitated  by  this  weakening  of  the 
hypothesis  [9,  (1.5)]  have  been  dealt  with  in  [11,  p.  230  to  p.  231,  line  4].  In  this  context 
it  is  perhaps  helpful  to  observe  that  the  argument  in  [9,  (2.11)]  requires  no  change,  be¬ 
cause  A  is  assumed  to  be  £o -measurable.  Hence  [9,  Proposition  1]  remains  valid  with  the 
appropriate  interpretation  of  A:  As  k  — ►  oo 


(3.2.1) 


sup  E 
M  <k* 


E{exp(i(u,  Zk ))  |  ^k-i}  -  exp 


<  k~5d. 


Remark  3.4.  For  the  proof  of  (3.2.1)  the  hypothesis  that  A  be  £<j-measurable  is  not  needed. 
As  a  matter  of  fact  the  same  argument  shows  that  if  Qk  denotes  the  cr-field  generated  by 
WjJ  <  r(fc)}  ^en 


sup  E 
M<*» 


E{exp(i{u,Zk))  |  -E  jexp  ^-^(u,Au)j  |  £*-1 J 


<  k~sd 


This  together  with  some  routine  calculations  imply  the  CLT  with  a  mixture  of  Gaussian 
distributions  as  limit. 


We  now  apply  Theorem  3  to  the  sequence  {Xk,k  >  1}  =  {Zk,k  >  1},  Tk  =  k$, 
Q  =  <t(A)  and  gk(u  |  Q)  —  exp (— j(u,  Au)).  We  obtain  sequence  {Yk,k  >  1}  of  ^-valued 
random  variables,  defined  on  (fl,  5,  P)  with  the  following  properties: 

Conditional  on  A  the  sequence  {Yk,k  >  1}  is  a  sequence  of  independent  random 
variables  with  (conditional)  characteristic  function  exp(-|(u,  Au))  such  that 


P(\Zk-Yk  \>ak)<ak 

with 

Qf*  <  k~&. 
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Next  we  apply  Lemma  2.2.4  with  the  random  variables 


x  =  {zk,k>  i},  r  =  (.4,{r*,fc>i}) 

and  the  law 

=  jr,  Nj,k>l}) 

\  j=*k- 1  +  1 

defined  on  the  appropriate  Polish  spaces  7Zd°°,  7 Zd*  ®  7Zdo°  and  7Zd°°.  Here  <jt  and  hk  are 
defined  in  [9,  (2.1)]  and  {Nk,k  >  1}  is  a  sequence  of  i.i.d.  standard  Gaussian  ^-valued 
random  variables,  independent  of  A.  Since  the  marginal  on  H?  ®  7 Zdo°  of  u, 

equals  C{Y)  there  exists  a  random  variable  Z  =  {yj,j  >  1},  defined  on  (Q,S,P)  with 
7Zd- valued,  i.i.d.  standard  Gaussian  components  yj,  independent  of  A  such  that  for  all 
Jfe>l 

‘*+t  .  \ 

Zk-A^h^  ^2  yj  <<**• 

i+i  / 

Summing  these  relations  over  k  =  1,...  ,M  we  obtain  the  analogue  of  [9,  (2.20],  The 
remaining  changes  in  [9,  Section  2.4]  are  routine. 

We  note  that  if  A  invertible  the  proof  of  (3.2.2)  can  be  simplified,  because  it  is  easily 
checked  that  k  >  1}  is  a  sequence  of  i.i.d.  standard  Gaussian  7^ -valued  random 

variables. 

3.2.2  The  case  d  —  oo. 

The  proof  of  Theorem  7  in  the  infinite-dimensional  case  is  almost  identical  to  the  proof 
given  in  (9,  Section  3],  as  ammended  in  [11,  pp.  231-232]  to  take  care  of  the  weakened 
hypothesis  (3.1.8).  The  idea  behind  the  proof  is  this:  One  approximates  the  if-valued 
martingale  difference  sequence  xn  be  a  finite  dimensional  martingale  difference  sequence 
ir*xn  of  ever  increasing  dimension  d*,  which,  by  the  way,  will  be  random.  One  then  applies 
the  results  of  Section  3.2.1  to  irtxn  to  construct  finite  dimensional  approximations  of  r *xn 


by  mixtures  of  Gaussian  random  variables.  Finally,  a  bounded  law  of  the  iterated  logarithm 
for  xn  —  TricXn  is  proven  to  show  that  the  approximation  errors  are  negligible. 

As  noted  above,  this  program  has  been  carried  out  in  [9,  Section  3]  and  [11,  pp.  231— 
232].  There  are  three  items  which  need  attention.  First,  from  the  middle  of  [9,  p.  247]  on 
a  factor  1  is  missing.  Second,  in  three  lines  in  the  lower  half  of  [9,  p.  247]  the  symbol 
QM  erroneously  got  omitted.  (Recall  that  ||  •  ||  denotes  the  seminorm  defined  in  [9,  p.  232, 
line  8]  and  not  the  operator  norm.)  Third,  for  random  A  the  estimate  of  III  in  [9,  p.  248, 
line  4]  needs  proof.  In  other  words,  we  need  to  show  that  for  all  w  G  ft 

(3.2.3)  WQkAQtW  -*  0 

as  k  -*  oo.  There  are  at  least  two  ways  to  see  this.  By  [13,  p.  326,  Remark]  we  can 
approximate  A  (for  each  fixed  u>)  in  the  operator  norm  by  a  finite  dimensional  operator 
on  H  with  finite-dimensional  domain.  Hence  (3.2.3)  follows.  We  thank  Loren  Pitt  for 
this  remark.  But  (3.2.3)  also  can  be  proved  by  observing  that  for  each  fixed  u,  A  is  the 
covariance  operator  of  some  square  integrable  vector  £,  say,  and  that 

->  o. 


3.3.  A  counterexample. 


We  shall  show  now  that  in  Theorem  7  the  hypothesis  that  A  is  £*-measurable  for 
some  k  >  1  cannot  be  omitted. 

Let  ([0, 1),J?,A)  be  the  unit  interval  with  Lebesgue  measure  and  let  {r„,n  >  1}  be 
the  sequence  of  Rademacher  functions,  defined  on  [0,1).  Let  £q  be  the  trivial  cr-field  and 
let  Cn  be  the  <r-field  generated  by  the  dyadic  intervals  of  rank  2n.  For  n  >  0  define  2x2 
random  matrices 


*n+i(w) 


Jfc2“n  0 
0  1  —  Jb2“n 


if  Jfc2_n  <  u;  <  (k  +  1)2-",  0  <  k  <  2n. 


Set 


n  >  1 
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Then  {x„,£n,n  >  1}  is  a  2-dimensional  martingale  difference  sequence.  Now 

An  :=  ^2 <Tk 

k<n 

has  trace  Vn  =  tr  An  —  n.  Thus  for  all  0  <  w  <  1  the  limit 

A  :=  Km  An/Vn  =  ["  -  ° 

n— »oo  U  i  —  (jJ 

exists.  As  a  matter  of  fact  both  conditions  (3.1.7)  and  (3.1.8)  are  satisfied  with  /(x)  =  il. 

Suppose  there  exists  a  sequence  {y*,fc  >  1}  of  i.i.d.  2-dimensional  standard  Gaussian 
random  variables,  independent  of  A  such  that  with  probabihty  1 

-  A*  ]Ty*  =of(nloglogn)2V  n -+ oo 

Jfc<n  Jfc<n 

or  what  amounts  to  the  same 

fc<n  k<n 

Since  A  generates  B ,  the  sequence  {y*,  k  >  1}  is  independent  of  {xk,k  >  1}  and  thus 
of  >  1).  Hence  we  have  by  a  Fubini  type  argument  that  for  some  sequence 

{arnjtt  >  1}  of  constant  2-dimensional  vectors 

=  o  f  (n  log  log  n)  *  J  a.s. 

k<n 

But  this  must  also  hold  for  any  independent  copy  {y£,  k  >  1}  of  {y*,  k  >  1}  and  so 

£(y*  ~  yft 

k<  n 

This  contradicts  the  classical  LIL. 


=  o((n  log  log n)*)  , 
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